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AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. Claims 1-4, 8, 17, 19-21, 26, 30, 31. 37, 42, 44 f 53, 54, and 62-64 have been 
amended herein. 

Listing of Claims; 

L (Currently amended): A computer implemented system that facilitates te 
facilitate building a statistical model for a computer readable data set, comprising: 

a first training algorithm that efficiently builds operative; to e ffici e ntly 
build a rough fest model from a for e ach subset of the computer readable data set; 

an evaluation component that determines function operable to d e t e rm ine 
whether the [[a]] subset of the computer readable data set for which the rocpootiv e first 
model was built is an appropriate subset to build a model for the computer readable of th e 
data set; and 

a second training algorithm that builds a refined op e rable to build a cocond 
model for the computer readable data set from the subset if deemed appropriate by the 
evaluation component bas e d on th e appropriate oubcot of tho data s e t, th e s e cond trm'nrng 
algorithm boing more accurat e than th e first training algorithm . 

2. (Currently amended): The system of claim 1, further comprising a data 
scheduler which, based on a data policy, controls i mp e rativ e to control the size of 
subsets for which the first training algorithm is applied 

3 . (Currently amended): The system of claim 2 S wherein the data scheduler 
increases is operabl e to incr e as e the size of the subset to provide a larger aggregate subset 
of the data set if the rough first model is unacceptable, the first training algorithm 
efficiently builds the rough b e ing op e rative to efficiently build tho first model for each 
larger aggregate subset of the data until the evaluation component function determines 
the resulting rough first model to be acceptable. 
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4. (Currently amended): The system of claim 3, wherein the acceptability of 
each rough first model is determined based on a stopping criterion functionally related to 
an expected incremental benefit and a cost associated with increasing the size of the 
aggregate subset of the data set. 

5. (Original): The system of claim 4, wherein the cost of the stopping 
criterion is functionally related to at least one of time associated with evaluating an 
aggregate data subset of increased size and size of the aggregated subset of the data. 

6. (Original): The system of claim 4„ wherein the stopping criterion is 
defined by 

f 1{Pho 1 )) ~ KDyo I )) ) 1 < 

where 

/(DnolQCDn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/(E>Ho|9(D n -i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(DHo|9b MC (D n )) is a log likelihood for holdout data evaluated for a base 

model, 

ci, C2, and C3 are constants determined based on application of the second 
training algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second training algorithm, when applied 
to the first subset, 

J n = and 

n /=) 

Ji is the number of iterations for the first training algorithm when applied 
to a data subset Di, 

I Drt+i| is the size of data set D n +i, 
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| AD n+ i| is the increment in size | D„«| - 1 D^, 
A. is a user detennined stopping threshold . 

7. (Original): The system of claim 4, wherein the stopping criterion is 
defined by 



^(DMo|9(Dn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/PHo|0pn-O) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(DHoi©hasc(D n )) is a log likelihood for holdout data evaluated for a base 

model, 

8 is an offset associated with a difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data $et by 
the respective first and second training algorithms, 

Ci, c 2> and C3 are constants detennined based on application of the second 
training algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second training algorithm, when applied 
to the first subset. 



Ji is the number of iterations for the first training algorithm when applied 
to a data subset Di. 

I D n+1 | is the size of data set Dn+i, 

I ADn+i| is the increment in size J Dn+i| - 1 D n |, and 

A. is a user determined stopping threshold. 




where 



— 1 " 

/„ = -£/,. and 
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8. (Currently amended): The system of claim 1 , wherein the first training 
algorithm further comprises an iterative algorithm, which builds is opprative to build the 
rough model for the subset of the data set according to an associated training policy. 

9. (Original): The system of claim 8, wherein the first training algorithm 
further comprises an associated training policy that defines parameter initialization of the 
first training algorithm for each subset of the data set. 

1 0. (Original): The system of claim 9, wherein the training policy associated 
with the first training algorithm further controls parameter initialization of the first 
training algorithm, such that at least some of the parameters computed for a previous 
subset of the data are employed to initialize the first training algorithm for a subsequent 
larger aggregate subset of the data. 

11. (Original): The system of claim 9 S wherein the first training algorithm is 
initialized by the same parameter values for each subset of the data subset. 

12. (Original): The system of claim 9, wherein the training policy sets the 
iterative algorithm to perform a fixed number of at least one iteration. 

13. (Original): The system of claim 12, wherein the training policy sets the 
iterative algorithm to perform a single iteration. 

14. (Original): The system of claim 12, wherein the second training algorithm 
further comprises an iterative algorithm that operates according to an associated training 
policy, so as to produce a more accurate model for the appropriate subset of the data set 
than the first training algorithm. 
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15. (Original): The system of claim 14> wherein the iterative algorithm 
associated with at least one of the first and second training algorithms is an Expectation 
and Maximization algorithm. 

16. (Original): The system of claim 8, wherein the training policy associated 
with the iterative algorithm of the first training algorithm controls the iterative algorithm 
to run until an associated convergence criterion is satisfied. 

17. (Currently amended): The system of claim 16, wherein second training 
algorithm further comprises an iterative algorithm, which builds is operativ e to build the 
refined model for the appropriate subset of the data set according to an associated training 
policy. 

18. (Original): The system of claim 17, wherein the training policy associated 
with the iterative algorithm of the second training algorithm controls the respective 
iterative algorithm to run until an associated convergence criterion is satisfied, wherein 
the convergence criterion associated with the second training algorithm provides 
improved model quality relative to the convergence criterion associated with the first 
training algorithm. 

19. (Currently amended): A computer implemented system programmed to 
facilitate building a statistical model, comprising: 

a first parameter estimation algorithm that efficiently builds a rough model 
from a subset of a computer readable data set operabl e to officiontly build modelfi-fog 
subsets of a data s e t based on a training policy associated therewith; and 

an evaluation component that determines function op e rable to determino 
whether tbe [[a]] subset of data from &r which the rough model was built is an 
appropriate size for building the statistical model to characterize the data set; 

a second parameter estimation algorithm that builds a refined op e rabl e to 
build a model for on a s ubset of the data set from the subset if determined to have having 
the appropriate size, the second parameter estimation algorithm having an associated 
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training policy, which enables the second parameter estimation algorithm to build a more 
accurate model than the first parameter estimation algorithm. 

20. (Currently amended): The system of claim 19, further comprising a data 
scheduler that increases operabl e to increa s e the size of the subset of the data set to 
provide a larger aggregate subset of the data set if the rough model is unacceptable, the 
first parameter estimation algorithm efficiently builds b e ing operative to efficiently build 
a rough model for each larger aggregate subset until a resulting rough model built 
therefrom is determined to be acceptable. 

2 1 . (Currently amended): The system of claim 1 9, wherein the first parameter 
estimation algorithm further comprises an iterative algorithm that builds operativ e to 
build the rough model for each subset of the data set according to the associated training 
policy. 

22. (Original) : The system of claim 21, wherein the training policy for the 
first parameter estimation algorithm is operative to control parameter initialization for the 
first parameter estimation algorithm, such that at least some of the parameters computed 
for a previous subset of the data are employed to initialize the first parameter estimation 
algorithm for a subsequent larger aggregate subset of the data set. 

23. (Original): The system of claim 21, wherein the first parameter estimation 
algorithm is initialized by the same parameter values for each subset of the data subset. 

24. (Original): The system of claim 21, wherein the training policy associated 
with first parameter estimation algorithm controls the iterative algorithm of the first 
parameter estimation algorithm to perform a fixed number of at least one iteration, the 
second training algorithm further comprising an iterative algorithm, which is operative to 
perform a greater number of iterations than the iterative algorithm of the first training 
algorithm based on a training policy associated with the second parameter estimation 
algorithm. 



7 

PAGE 7/24 * RCVD AT 3/1/2005 6:55:44 PM [Eastern Standard Time] * SVR:USPT0£ FXRF-1/1 * Df0S:8729306 * CSID;216 696 8731 * DURATION (mm-ss):07-00 



03/.01/-2005 1$: 52 FAX 216 696 $731 



AMIN, & TUROCY LLP. 



BOOS 



09/873.719 



MS1S8346.01/MSFTPI84US 



25. (Original): The system of claim 21, wherein the training policy associated 
with the iterative algorithm of the first parameter estimation algorithm controls the 
iterative algorithm to run until an associated convergence threshold is satisfied, wherein 
the second training algorithm further comprises an iterative algorithm, the training policy 
associated with the iterative algorithm of the second parameter estimation algorithm 
being operative to control the respective iterative algorithm to run until an associated 
convergence threshold is satisfied, the convergence threshold associated with the second 
parameter estimation algorithm is less than the convergence threshold associated with the 
first parameter estimation algorithm. 

26. (Currently amended): The system of claim 19, wherein the evaluation 
component function determines whether the subset of data for which the rough model 
was built is an appropriate size based on a stopping criterion, which is functionally 
related to an expected incremental benefit and an expected incremental cost associated 
with increasing size of the subset of data. 

27. (Original): The system of claim 26, wherein the cost of the stopping 
criterion is functionally related to at least one of time associated with evaluating the 
model for a larger subset of data and size of the larger subset of the data, 

28. (Original): The system of claim 26, wherein the stopping criterion is 
defined by 

( l(D H0 \9(D n ))-l(D ff0 |tf(D^)) ) l 



where 

/(DHo|6(D n )) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

Z(D H o[9(Pn-i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 
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^(I>Ho|0baseOOn)) is a log likelihood for holdout data evaluated for a base 

model, 

ci , c 2j arid c 3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to the first subset, 

— 1 n 

Ji is the number of iterations for the first parameter estimation algorithm 
when applied to a data subset Di, 

| D„+i| is the size of data set D^i, 

| AD n+ i| is the increment in size | D n +i| - 1 DJ, and 

X is a user determined stopping threshold. 

29. (Original): The system of claim 26, wherein the stopping criterion is 
defined by 

[l(D HO | 0(Dj) + 8-l(D HO I e SASE (D n ))j qtf -/,) | AD„ I -^(J, -j H )+ Cr r m | D flTi I +cj m +<v 
where 

/(D h o|9(Dt\)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a cunent subset of the training data set, 

/(Dho|B(IVi)) is a. log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(Di-iol&baseCDn)) is a log likelihood for holdout data evaluated for a base 

model, 

5 is an offset associated with a difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data set by 
the respective first and second training algorithms, 
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ci, C2, and c 3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first data subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to a first data subset, 

— 1 n 

J; is the number of iterations for the first parameter estimation algorithm 
when applied to a data subset X> h 

| D n+ i| is the size of data set D n +i, 

[ AD n +i| is the increment in size | D^+il - | D n L and 

A, is a user determined stopping threshold. 

30. (Currently amended): A computer implemented learning cmtvp. mrthnH tn 
facilitate building a statistical model, comprising: 

choosing a subset of a computer readable data set; 
employing a first training algorithm to build a rough firs* model to 
characterize the subset; 

evaluating the rough first model; 

if the rough model is unacceptable, repeatedly increasing the size of 
the subset of data to provide an aggregate data set, building another rough fi*st model to 
character^ the aggregate subset, and reevaluating the model; and 

if the model is acceptable, employing a second training algorithm to build 
a refined second model based on the aggregate data set, the second training algorithm 
being different from the first training algorithm. 

3 1 . (Currently amended): The method of claim 30, further comprising 
detemiining the acceptability of each rough first model based on a stopping criterion 
functionally related to an expected incremental benefit and an expected incremental cost 
associated with increasing the size of the aggregate subset of the data set. 
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32. (Original); The system of claim 31, wherein the cost of the stopping 
criterion is functionally related to at least one of time associated with evaluating an 
aggregate data subset of increased size and size of the aggregate subset of the data. 

33. (Original): The system of claim 31, wherein the stopping criterion is 
defined by 

( KD^mD^-KD^mD^)) ^ i 

U**, ITO-I(D«IW^))J«i« A£U -7,)+^. \D n+l |^ 2 7 B +c 3 < 
where 

/(DnolQCDn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

f(DHo|6(Dn.i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(D H ol6toe(Pn)) is a log likelihood for holdout data evaluated for a base 

model, 

cu C2, and C3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to the first subset, 

— 1 * 

J n « and 

Ji is a number of iterations for the first parameter estimation algorithm 
when applied to a data subset Di, 

I Du+il is a size of data set D^t, 

I AD n +i| is an increment in size | D n 4-i| - 1 D n |, and 

X is a user determined stopping threshold. 
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34. (Original): The system of claim 31, wherein the stopping criterion is 
defined by 



1 



f KD H o\0(D H ))-l(D ffO \0(D^)) 

{j(D m | e(D n ))+6-!(D H0 | <WZ>J)J 0,(1, -/J | AD^ | --O+c^ \ D ml \ + c 3 
where 

J(DHo|6(Dn)) * s a *°S likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

*(Dho1Q(Dii-i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

J(Dnol&bas6(D n )) is a log likelihood for holdout data evaluated for a base 

model, 

5 is an offset associated with the difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data set by 
the respective* first and second training algorithms, 

c j.j C2 s and C3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first data subset of the data set, 

li is a number of iterations for the second parameter estimation algorithm, 
when applied to a first data subset, 

7,=-^^-, and 

Ji is a number of iterations for the first parameter estimation algorithm 
when applied to a data subset D,, 

I D n +i| is a size of data set D n +i p 

I ADn+i| is an increment in size | -D n +i| - 1 D n |, and 

A. is a user determined stopping threshold. 

35. (Original): The method of claim 30, wherein the first training algorithm is 
more computationally efficient than the second training algorithm. 
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36. (Original); The method of claim 30, wherein each instance of model 
building repeated until obtaining an acceptable model by the first training algorithm 
employs more efficient and less accurate model building than model building employed 
by the second training algorithm that occurs after obtaining the acceptable model. 

37. (Currently amended): The method of claim 36, wherein each instance of 
model building repeated until obtaining an acceptable model employs the first training 
algorithm as an iterative algorithm that is run to a first convergence criterion, the second 
training algorithm employing an iterative algorithm that is run to a second convergence 
criterion, which demands more iterations than the first convergence criterion in order to 
obtain convergence, so that the refined s e cond model is more accurate than the rough fest 
model built by the first training algorithm. 

38. (Original): The method of claim 36, wherein each instance of model 
building repeated until obtaining an acceptable model employs an iterative algorithm 
having a fixed number of at least one iteration, the second training algorithm employing 
an iterative algorithm having a greater number of iterations than the fixed number. 

39. (Original): The method of claim 30, further comprising controlling 
parameter initialization employed in each instance of building a model for the aggregate 
data set prior to obtaining an acceptable model. 

40. (Original): The method of claim 39, further comprising initializing the 
first training algorithm by the same parameter values for each subset. 

41. (Original): The method of claim 39, wherein the controlling further 
comprises reusing at least some of the parameters computed from a previous instance of 
model building to initialize a subsequent instance of model building for a subsequent 
larger aggregate data set prior to obtaining an acceptable model. 
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42. (Currently amended): A computer-readable medium having computer- 
executable instructions for: 

choosing a subset of a computer readable data set; 
building a rough model to characterize the subset based on an associated 
training policy; 

evaluating the rough model; 

if the rough model is unacceptable, repeatedly increasing the size of the 
subset of data to provide an aggregate data set, building a rough model to characterize the 
aggregate subset based on an associated training policy, and reevaluating the rough 
model; and 

building a refined model for the computer readable data set from i£4he 
mod a l is acc e ptabl e , e mploying the aggregate data set if the rough model is determined to 
be acceptable t e- bmki a corr e sponding mod e l based on an associated training policy^the 
training policy a s sociated with tho modol building ropootod until obtaining an acc e ptabl e 
model being mor e computationally offioiont than the training policy acaooi - at e d with 
model building s ub se qu e nt thereto . 

43. (Original): The method of claim 42, further comprising determining the 
acceptability of the model based on an expected incremental benefit relative to an 
expected incremental cost associated with increasing the size of the aggregate data set. 

44. (Currently amended): A computer implemented method to facilitate 
constructing a statistical model, comprising: 

separating computer readable data into holdout data and training data; 

determining a data subset from the training data by estimating model 
parameters according to a first training policy and evaluating the estimated model 
parameters relative to the holdout data set and repeating the estimation and evaluation of 
model parameters with a larger subset of the training data until an acceptable quality of 
the estimated model is established; and, 
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subsequent to establishing the acceptable quality of the estimated model, 
using the determined data subset to improve the estimated model parameters by 
employing a second training policy that is more accurate than the first training policy. 

45. (Original): The method of claim 44, wherein each estimation of model 
parameters repeated until the acceptable quality of the estimated model is established 
further comprises employing an iterative algorithm that is run until a first convergence 
criterion is satisfied, the estimation of model parameters using the determined data subset 
further comprising an iterative algorithm that is run until a second convergence criterion 
is satisfied, which is operative to provide a better quality of model than the first 
convergence criterion. 

46. (Original): The system of claim 45, wherein the first convergence 
criterion causes the associated iterative algorithm to run until a first convergence 
threshold is satisfied, wherein the second convergence criterion causes the associated 
iterative algorithm to run until a second convergence threshold is satisfied, the second 
convergence threshold being less than the first convergence tbreshold- 

47. (Original): The method of claim 45, wherein at least one of the iterative 
algorithm run to the first convergence criterion and the iterative algorithm run to the 
second convergence criterion is an Expectation and Maximization algorithm. 

48 . (Original): The method of claim 44, wherein each estimation of model 
parameters repeated until the acceptable quality of the estimated model is established 
employs an iterative algorithm having a fixed number of at least one iteration, the 
estimation of model parameters using the determined data subset further employing an 
iterative algorithm having a greater number of iterations than the fixed number. 

49. (Original): The method of claim 44, further comprising controlling 
parameter initialization employed in each estimation of model parameters repeated until 
determining an acceptable size for the determined data subset. 
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50. (Original): The method of claim 44, wherein the controlling further 
comprises reusing at least some of the parameters computed from a previous estimation 
of model parameters to initialize a subsequent estimation of model parameters for a next 
larger subset of the training set 

51. (Original): The method of claim 44, wherein each estimation of model 
parameters repeated until the acceptable quality of the estimated model is established 
further comprises initializing the first training algorithm by the same parameter values. 

52. (Original): The method of claim 44, further comprising determining the 
acceptability of the estimated model based on an expected incremental benefit relative to 
a cost associated with increasing the size of the subset of the data set, 

53. (Currently amended): A computer-readable medium having computer- 
executable instructions for: 

separating computer readable data into holdout data and training data; 

determining a data subset from the training data by estimating model 
parameters according to a first training policy and evaluating the estimated model 
parameters relative to the holdout data set and repeating the estimation and evaluation of 
model parameters with a next successively larger subset of the training data set until an 
acceptable quality of the estimated model is established; and 

subsequent to establishing the acceptable quality of the estimated model, 
using the determined data subset to improve the estimated model parameters by 
employing a second training policy that is more accurate than the first training policy. 
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54. (Currently amended): A computer implemented method to facilitate 
constructing a statistical model, comprising: 

separating computer readable data into a holdout data set and a training 

data set; 

iteratively estimating model parameters for a subset of the training data set 
over a fixed number of iterations and evaluating the estimated model parameters relative 
to the holdout data set; 

repeating the estimation and evaluation of model parameters obtained with 
successively larger subsets of the training data set until an acceptable model quality is 
established; and 

after the acceptable model quality is established, iteratively estimating 
model parameters for the data subset, which provided the acceptable model quality, until 
a better quality of model is provided relative to a preceding estimation performed over 
the fixed number of iterations. 

55. (Original): The method of claim 54, wherein at least one of the iterative 
estimations employs an Expectation and Maximization algorithm. 

56. (Original): The method of claim 54, wherein the estimation that occurs 
after the acceptable model quality is established, further comprises employing an iterative 
algorithm having a greater number of iterations than the fixed number. 

57. (Original): The method of claim 54, wherein the estimation of model 
parameters after the acceptable model quality has been established further comprises 
employing an iterative algorithm that is run until a convergence criterion is satisfied, 
which is operative to provide a better quality of model with the data subset than a 
preceding estimation employing the fixed number of iterations. 

58. (Original): The method of claim 54, further comprising controlling 
parameter initialization for each estimation of model parameters that occurs before the 
acceptable model quality has been established 
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59. (Original): The method of claim 58, wherein each iterative estimation 
until the acceptable model quality is established further comprises initializing the first 
training algorithm by the same parameter values. 

60. (Original): The method of claim 58, wherein the controlling further 
comprises reusing at least some of the parameters obtained in a previous estimation of 
model parameters to initialize a subsequent estimation of model parameters for a next 
larger subset of the training data set. 

6 1 . (Original) : The method of claim 54, further comprising determining the 
acceptability of the model based on an expected incremental benefit relative to an 
expected incremental cost associated with an increase in size of each larger training 
subset of the data set 

62. (Currently amended): A computer implemented method to famlftnte 
constructing a statistical model, comprising: 

separating computer readable data into a holdout data set and a training 

data set; 

iteratively estimating model parameters for a subset of the training data set 
until a first convergence threshold is satisfied and evaluating the estimated model 
parameters relative to the holdout data set; 

repeating the estimation and evaluation of model parameters obtained with 
successively larger subsets of the training data set until determining a size of data subset 
that provides acceptable model parameters; and 

after determining the size of data subset that provides acceptable model 
parameters, iteratively estimating model parameters for a data subset of the acceptable 
size until a second convergence threshold is satisfied, the second convergence threshold 
being less than the first convergence threshold. 
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63. (Currently amended): A computer implemented system to facilitate 
building a statistical model for a computer readable data set, comprising: 

first means for building a rough first model to characterize a subset of the 
computer readable data set; 

evaluation means for evaluating the acceptability of the rough model, the 
first means building another rough first model for a larger subset of the data if the 
evaluation means determines that a prior rough fifst model is unacceptable; and 

second means, which is different from the first means, for building a 
refined s e cond model from to characterize an aggregate subset of data that yielded the 
rough model deemed acceptable by the evaluation means e nabl e d the firat mon-ns -to 
produoo an ao o o ptabl e mod e l . 

64. (Currently amended): A computer implemented system to facilitate 
building a statistical model for a computer readable data set, comprising: 

first means for estimating model parameters from a subset of the computer 
readable data set; 

means for evaluating the estimated model parameters relative to a holdout 
set of the data set; 

means for determining a data subset from the training data by causing the 
first means and the means for evaluating to respectively repeat estimation and evaluation 
of model parameters with a next successively larger subset of the training data set until an 
acceptable quality of the model parameters is established; and 

second means for estimating model parameters based on the determined 
data subset to provide a more accurate estimation of model parameters than the first 
means. 
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